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Abstract. Hybridization networks are representations of evolutionary histories that al- 
low for the inclusion of reticulate events like recombinations, hybridizations, or lateral gene 
transfers. The recent growth in the number of hybridization network reconstruction algo- 
Py ' rithms has led to an increasing interest in the definition of metrics for their comparison 

CLj , that can be used to assess the accuracy or robustness of these methods. In this paper 

we establish some basic results that make it possible the generalization to tree-child time 

consistent (TCTC) hybridization networks of some of the oldest known metrics for phylo- 

r^ , genetic trees: those based on the comparison of the vectors of path lengths between leaves. 

More specifically, we associate to each hybridization network a suitably defined vector of 
'splitted' path lengths between its leaves, and we prove that if two TCTC hybridization 
networks have the same such vectors, then they must be isomorphic. Thus, comparing 
these vectors by means of a metric for real-valued vectors defines a metric for TCTC hy- 
^ \ bridization networks. We also consider the case of fully resolved hybridization networks, 

where we prove that simpler, 'non-splitted' vectors can be used. 
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. • ' 1 Introduction 

o. 

OO , An evolutionary history is usually modelled by means of a rooted phylogenetic tree, 

whose root represents a common ancestor of all species under study (or whatever other 
taxonomic units are considered: genes, proteins,. . . ), the leaves, the extant species, and 
the internal nodes, the ancestral species. But phylogenetic trees can only cope with spe- 
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H I elation events due to mutations, where each species other than the universal common 

ancestor has only one parent in the evolutionary history (its parent in the tree). It is 
clearly understood now that other speciation events, which cannot be properly repre- 
sented by means of single arcs in a tree, play an important role in evolution [10]. These 
are reticulation events like genetic recombinations, hybridizations, or lateral gene trans- 
fers, where a species is the result of the interaction between two parent species. This 
has lead to the introduction of networks as models of phylogenetic histories that capture 
these reticulation events side by side with the classical mutations. 

Contrary to what happens in the phylogenetic trees literature, where the basic con- 
cepts are well established, there is still some lack of consensus about terminology in the 
field of 'phylogenetic networks' [16]. Following [23], in this paper we use the term hy- 
bridization network to denote the most general model of reticulated evolutionary history: 



a directed acyclic graph with only one root, which represents the last universal common 
ancestor and which we assume, thus, of out-degree greater than 1. In such a graph, nodes 
represent species (or any other taxonomy unit) and arcs represent direct descendance. 
A node with only one parent (a tree node) represents a species derived from its par- 
ent species through mutation, and a node with more than one parent (a hybrid node) 
represents a species derived from its parent species through some reticulation event. 

The interest in representing phylogenetic histories by means of networks has lead 
to many hybridization network reconstruction methods [13,14,17-19,21,25,28]. These 
reconstruction methods often search for hybridization networks satisfying some restric- 
tion, like for instance to have as few hybrid nodes as possible (in perfect phylogenies), 
or to have their reticulation cycles satisfying some structural restrictions (in galled trees 
and networks). Two popular and biologically meaningful such restrictions are the time 
consistency [1, 18], the possibility of assigning times to the nodes in such a way that 
tree children exist later than their parents and hybrid children coexist with their par- 
ents (and in particular, the parents of a hybrid species coexist in time), and the tree 
child condition [8,31], that imposes that every non-extant species has some descendant 
through mutation alone. The tree-child time consistent (TCTC) hybridization networks 
have been recently proposed as the class where meaningful phylogenetic networks should 
be searched [30]. Recent simulations (reported in [27]) have shown that over 64% of 
4132 hybridization networks obtained using the coalescent model [15] under various pop- 
ulation and sample sizes, sequence lengths, and recombination rates, were TCTC: the 
percentage of TCTC networks among the time consistent networks obtained in these 
simulations increases to 92.8%. 

The increase in the number of available hybridization networks reconstruction al- 
gorithms has made it necessary the introduction of methods for the comparison of hy- 
bridization networks to be used in their assessment, for instance by comparing inferred 
networks with either simulated or true phylogenetic histories, and by evaluating the ro- 
bustness of reconstruction algorithms when adding new species [18,32]. This has lead 
recently to the definition of several metrics defined on different classes of hybridization 
networks [4-6,8,9,18,20]. All these metrics generalize in one way or another well-known 
metrics for phylogenetic trees. 

Some of the most popular metrics for phylogenetic trees are based on the comparison 
of the vectors of path lengths between leaves [3,11,12,22,26,29]. Introduced in the early 
seventies, with different names depending on the author and the way these vectors are 
compared, they are globally known as nodal distances. Actually, these vectors of paths 
lengths only separate (in the sense that equal vectors means isomorphic trees), on the 
one hand, unrooted phylogenetic trees, and, on the other hand, fully resolved rooted 
phylogenetic trees, and therefore, as far as rooted phylogenetic trees goes, the distances 
defined through these vectors are only true metrics for fully resolved trees. These metrics 
were recently generalized to arbitrary rooted phylogenetic trees [7] . In this generalization, 
each path length between two leaves was replaced by the pair of distances from the 
leaves to their least common ancestor, and the vector of paths lengths between leaves 
was replaced by the splitted path lengths matrix obtained in this way. These matrices 



separate arbitrary rooted phylogenetic trees, and therefore the splitted nodal distances 
defined through them are indeed metrics on the space of rooted phylogenetic trees. 

In a recent paper [6] we have generaUzed these sphtted nodal distances to TCTC 
hybridization networks with all their hybrid nodes of out-degree 1. The goal of this 
paper is to go one step beyond in two directions: to generalize to the TCTC hybridization 
networks setting both the classical nodal distances for fully resolved rooted phylogenetic 
trees and the new splitted nodal distances for rooted phylogenetic trees. Thus, on the one 
hand, we introduce a suitable generalization of the vectors of path lengths between leaves 
that separate fully resolved (where every non extant species has exactly two children, 
and every reticulation event involves exactly two parent species) TCTC hybridization 
networks. On the other hand, we show that if we split these new path lengths in a suitable 
way and we add a bit of extra information, the resulting vectors separate arbitrary 
TCTC hybridization networks. Then, the vectors obtained in both cases can be used to 
define metrics that generalize, respectively, the nodal distances for fully resolved rooted 
phylogenetic trees and the splitted nodal distances for rooted phylogenetic trees. 

The key ingredient in the proofs of our main results is the use of sets of suitable 
reductions that applied to TCTC hybridization networks with n leaves and m internal 
nodes produce TCTC hybridization networks with either n — 1 leaves or with n leaves and 
m — \ internal nodes (in the fully resolved case, the reductions we use are specifically tai- 
lored to make them remove always one leaf). Similar sets of reductions have already been 
introduced for TCTC hybridization networks with all their hybrid nodes of out-degree 

1 [6] and for tree sibling (where every hybrid node has a tree sibling) time consistent 
hybridization networks with all their hybrid nodes of in-degree 2 and out-degree 1 [4], 
and they have been proved useful in those contexts not only to establish properties of the 
corresponding networks by algebraic induction, but also to generate in a recursive way 
all networks of the type under consideration. We hope that the reductions introduced in 
this paper will find similar applications elsewhere. 

2 Preliminaries 

2.1 Notations on DAGs 

Let N = (y, E) denote in this subsection a directed acyclic (non-empty, finite) graph; 
a DAG, for short. A node v ^ V \s a child oi u ^ V \i {u,v) G E; we also say in this 
case that m is a parent of v. All children of the same parent are said to be sibling of each 
other. 

Given a node v & V, its in-degree degj„(w) and its out-degree deg^^f{v) are, respec- 
tively, the number of its parents and the number of its children. The type of v is the 
ordered pair (degj„(f),dego„4(t;)). A node t; is a root when degj„(t;) = 0, a tree node 
when degj„(f ) ^ 1, a hybrid node when degj„(t;) ^ 2, a leaf when deg^^f{v) = 0, internal 
when deg^^f{v) ^ 1, and elementary when degj„(f) ^ 1 and degg^^{v) = 1. A tree arc 
(respectively, a hybridization arc) is an arc with head a tree node (respectively, a hybrid 
node). A DAG N is rooted when it has only one root. 



A path on iV is a sequence of nodes {vo,vi, . . . ,Vk) such that {vi-i,Vi) € E for all 
i = I, . . . ,k. We call vq the origin of the path, vi, . . . , Vk^i its intermediate nodes, and 
Vk its end. The length of the path (uq, fi, . . . , i^fc) is k, and it is non-trivial if /c ^ 1. The 
acyclicity of A^ means that it does not contain cycles: non-trivial paths from a node to 
itself. 

We denote hy u-^v any path with origin u and end v. Whenever there exists a path 
u~-^v, we shall say that ?; is a descendant of u and also that u is an ancestor of i;. When 
the path u~^t; is non-trivial, we say that f is a proper descendant of u and that u is an 
proper ancestor of v. The distance from a node n to a descendant v is the length of a 
shortest path from n to t;. 

The height h{v) of a node t; in a DAG N is the largest length of a path from t; to a 
leaf. The absence of cycles implies that the nodes of a DAG can be stratified by means of 
their heights: the nodes of height are the leaves, the nodes of height 1 are those nodes 
all whose children are leaves, the nodes of height 2 are those nodes all whose children are 
leaves and nodes of height 1, and so on. If a node has height tti > 0, then all its children 
have height smaller than m, and at least one of them has height exactly m — 1. 

A node t; of A is a strict descendant of a node u if it is a descendant of it, and every 
path from a root of A to f contains the node u: in particular, we understand every node 
as a strict descendant of itself. When i; is a strict descendant oi u, we also say that u is 
a strict ancestor of v. 

The following lemma will be used several times in this paper. 

Lemma 1. Let u be a proper strict ancestor of a node v in a DAG N , and let w be an 
intermediate node in a path u-^v. Then, u is also a strict ancestor of w. 

Proof. Let r~^whe a path from a root of A to w, and concatenate to it the piece w-^v 
of the path u~^ v under consideration. This yields a path r -^ v that must contain u. 
Since u does not appear in the piece w-^v, we conclude that it is contained in the path 
r-^w. This proves that every path from a root of A to if contains the node u. 

For every pair of nodes u,v of A: 

~ CSA{u, v) is the set of all common ancestors of u and v that are strict ancestors of 

at least one of them; 
— the least common semi-strict ancestor (LCSA) of u and v, in symbols [u,v], is the 

node in CSA{u, v) of minimum height. 

The LCSA of two nodes u,v m a phylogenetic network is well defined and it is unique: 
it is actually the unique element of CSA{u, v) that is a descendant of all elements of 
this set [5]. The following result on LCSAs will be used often. It is the generalization to 
DAGs of Lemma 6 in [6], and we include its easy proof for the sake of completeness. 

Lemma 2. Let N be a DAG and let u,v be a pair of nodes of A such that v is not a 
descendant of u. If u is a tree node with parent u' , then [u,v\ = [u' ,v\. 



Proof. We shall prove that CSA{u,v) = CSA{u',v). 

Let X € CSA{u,v). Since u is not an ancestor of v, x ^ u and hence any path 
x~^ti is non-trivial. Then, since u' is the only parent of u, it appears in this path, and 
therefore x is also an ancestor of u'. This shows that a; is a common ancestor of u' and 
V. Now, if X is a strict ancestor of v, we already conclude that x G CSA{u',v), while if 
X is a strict ancestor of it, it will be also a strict ancestor of u' by Lemma 1, and hence 
X e CSA{u',v), too. This proves that CSA{u,v) C CSA{u',v) 

Conversely, let x G CSA{u' ,v). Since u' is the parent of u, it is clear that x is 
a common ancestor of u and v, too. If a; is a strict ancestor of v, this implies that 
X G CSA{u, v). If X is a strict ancestor of u' , then it is also a strict ancestor of u (every 
path r -^ u must contain the only parent u' of n, and then x will belong to the piece 
r-^u' of the path r~^u), and therefore x G CSA{u,v), too. This finishes the proof of 
the equality. 

Let S be any non-empty finite set of labels. We say that the DAG N is labeled in S, 
or that it is an S-DAG, for short, when its leaves are bijectively labeled by elements of S. 
Although in real applications the set S would correspond to a given set of extant taxa, 
for the sake of simplicity we shall assume henceforth that S = {1, . . . , n}, with n = \S\. 
We shall always identify, usually without any further notice, each leaf of an S'-DAG with 
its label in S. 

Two 5-DAGs N, N' are isomorphic, in symbols N = N', when they are isomorphic 
as directed graphs and the isomorphism maps each leaf in N to the leaf with the same 
label in N'. 

2.2 Path lengths in phylogenetic trees 

A phylogenetic tree on a set S of taxa is a rooted S'-DAG without hybrid nodes and such 
that its root is non-elementary. A phylogenetic tree is fully resolved, or binary, when 
every internal node has out-degree 2. Since all ancestors of a node in a phylogenetic tree 
are strict, the LCSA [u, v] of two nodes u,v in a phylogenetic tree is simply their least 
common ancestor: the unique common ancestor of them that is a descendant of every 
other common ancestor of them. 

Let T be a phylogenetic tree on the set S" = {1, . . . ,n}. For every i,j G S, we shall 
denote by irii, j) and (tU, i) the lengths of the paths [i, j] -^ i and [i, j] -^ j, respectively. 
In particular, ixii, i) = for every i = 1, . . . ,n. 

Definition 1. Let T be a phylogenetic tree on the set S = {!,... ,n}. The path length 
between two leaves i and j is 

LTii,j)=iTii,j)+iT{j,i)- 
The path lengths vector of T is the vector 

L(r) = (L^(.,j))^^^^^.^„GN"(-i)/2 
with its entries ordered lexicographically in {i,j). 



The following result is a special case of Prop. 2 in [7]. 

Proposition 1. Two fully resolved phylogenetic trees on the same set S of taxa are 
isomorphic if, and only if, they have the same path lengths vectors. D 

The thesis in the last result is false for arbitrary phylogenetic trees. Consider for 
instance the phylogenetic trees with Newick strings (1,2, (3,4)) ; and ((1,2) ,3,4) ; 
depicted^ in Fig. 1. It is straightforward to check that they have the same path lengths 
vectors, but they are not isomorphic. 





Fig. 1. Two non-isomorphic phylogenetic trees witli tiie same patii lengtiis vectors. 

This problem was overcome in [7] by replacing the path lengths vectors by the fol- 
lowing matrices of distances. 

Definition 2. The splitted path lengths matrix ofT is the n x n square matrix 

i{T) = {Iriij)) ,=i,...,u gA^„(N). 

j = l,...,n 

Now, the following result is (again, a special case of) Theorem 11 in [7]. 

Proposition 2. Two phylogenetic trees on the same set S of taxa are isomorphic if, and 
only if, they have the same splitted path lengths matrices. D 

3 TCTC networks 

While the basic notion of phylogenetic tree is well established, the notion of phylogenetic 
network is much less well defined [16]. The networks we consider in this paper are the 
(almost) most general possible ones: rooted S'-DAGs with non-elementary root. Following 
[23], we shall call them hybridization networks. In these hybridization networks, every 
node represents a different species, and the arcs represent direct descendance, be it 
through mutation (tree arcs) or through some reticulation event (hybridization arcs). 

It is usual to forbid elementary nodes in hybridization networks [23], mainly because 
they cannot be reconstructed. We allow them here for two reasons. On the one hand, 



^ Hencefortli, in grapliical representations of DAGs, iiybrid nodes are represented by squares, tree nodes 
by circles, and indeterminate nodes, that is, nodes that can be of tree or hybrid type, by squares with 
rounded corners. 



because allowing them simplifies considerably some proofs, as it will be hopefully clear 
in Section 5. On the other hand, because, as Moret et al point out [18, §4.3], they can be 
useful both from the biological point of view, to include auto-polyploidy in the model, as 
well as from the formal point of view, to make a phylogeny satisfy other constraints, like 
for instance time consistency (see below) or the impossibility of successive hybridizations. 
Of course, our main results apply without any modification to hybridization networks 
without elementary nodes as well. 

Following [5] , by a phylogenetic network on a set S of taxa we understand a rooted S- 
DAG N with non-elementary root where every hybrid node has exactly one child, and it is 
a tree node. Although, from the mathematical point of view, phylogenetic networks are a 
special case of hybridization networks, from the point of view of modelling they represent 
in a different way evolutive histories with reticulation events: in a phylogenetic network, 
every tree node represents a different species and every hybrid node, a reticulation event 
that gives rise to the species represented by its only child. 

A hybridization network N = (V, E) is time consistent when it allows a temporal 
representation [1]: a mapping 

such that t{u) < t{v) for every tree arc (n, v) and t{u) = t{v) for every hybridization 
arc {u,v). Such a temporal representation can be understood as an assignment of times 
to nodes that strictly increases from parents to tree children and so that the parents of 
each hybrid node coexist in time. 

Remark 1. Let N = iy,E) be a time consistent hybridization network, and let A^i = 
iyi,Ei) be a hybridization network obtained by removing from N some nodes and all 
their descendants (as well as all arcs pointing to any removed node). Then Ni is still 
time consistent, because the restriction of any temporal representation r : y — >■ N of A^ 
to Vi yields a temporal representation of Ni. 

A hybridization network satisfies the tree- child condition^ or it is tree- child, when 
every internal node has at least one child that is a tree node (a tree child). So, tree-child 
hybridization networks can be understood as general models of reticulate evolution where 
every species other that the extant ones, represented by the leaves, has some descendant 
through mutation. Tree-child hybridization networks include galled trees [13, 14] as a 
particular case [8]. 

A tree path in a tree-child hybridization network is a non-trivial path such that its 
end and all its intermediate nodes are tree nodes. A node f is a tree descendant of a 
node u when there exists a tree path from u to v. By [9, Lem. 2], every internal node u 
of a tree-child hybridization network has some tree descendant leaf, and by [9, Cor. 4] 
every tree descendant f of u is a strict descendant of u and the path n~^f is unique. 

To simplify the notations, we shall call TCTC-networks the tree-child time consis- 
tent hybridization networks: these include the tree-child time consistent phylogenetic 
networks, which were the objects dubbed TCTC-networks in [5,6]. Every phylogenetic 
tree is also a TCTC-network. Let TCTC„ denote the class of all TCTC-networks on 
5 = {l,...,n}. 
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We prove now some basic properties of TCTC-networks that will be used later. 

Lemma 3. Let u be a node of a TCTC-network N , and let v he a child of u. The node 
V is a tree node if, and only if, it is a strict descendant of u. 

Proof. Assume first that f is a tree child of u. Since u is the only parent of v, every 
non-trivial path ending in v must contain u. This shows that u is a. strict ancestor of v. 
Assume now that w is a hybrid child of u that is also a strict descendant of it, and 
let us see that this leads to a contradiction. Indeed, in this case the set H{u) of hybrid 
children of u that are strict descendants of it is non-empty, and we can choose a node 
vq in it of largest height. Let vi be any parent of vq other than u. Since it is a strict 
ancestor of vq , it must be an ancestor of vi , and since u and vi have the hybrid child vq 
is common, they must have the same temporal representation, and therefore vi as well 
as all intermediate nodes in any path w-^vi must be hybrid. Moreover, since n is a strict 
ancestor of vq, it is also a strict ancestor of vi as well as of any intermediate node in any 
path u-^vi (by Lemma 1). In particular, the child of u in a path u-^vi will belong to 
H{u) and its height will be larger than the height of vq, which is impossible. 

Corollary 1. All children of the root of a TCTC-network are tree nodes. 

Proof. Every node in a hybridization network is a strict descendant of the root. Then, 
Lemma 3 applies. 

The following result is the key ingredient in the proofs of our main results; it general- 
izes to hybridization networks Lemma 3 in [6], which referred to phylogenetic networks. 
A similar result was proved in [4] for tree-sibling (that is, where every hybrid node has 
a sibling that is a tree node) time consistent phylogenetic networks with all its hybrid 
nodes of in-degree 2. 

Lemma 4. Every TCTC-network with more than one leaf contains at least one node v 
satisfying one of the following properties: 

(a) V is an internal tree node and all its children are tree leaves. 

(h) V is a hybrid internal node, all its children are tree leaves, and all its siblings are 

leaves or hybrid nodes, 
(c) V is a hybrid leaf, and all its siblings are leaves or hybrid nodes. 

Proof. Let A^ be a TCTC-network and r a temporal representation of it. Let vq be an 
internal node of highest r-value and, among such nodes, of smallest height. The tree 
children of vq have strictly higher r-value than v, and therefore they are leaves. And the 
hybrid children of vq have the same r-value than vq but smaller height, and therefore 
they are also leaves. 
Now: 

— If Vq is a tree node all whose children are tree nodes, taking v = vq we are in case 
(a). 



— If vq is a hybrid node all whose children are tree nodes, then its parents have its same 
r- value, which, we recall, is the highest one. This implies that their children {vq's 
siblings) cannot be internal tree nodes, and hence they are leaves or hybrid nodes. 
So, taking v = vq, we are in case (b). 

— If Vq has some hybrid child, take as the node v in the statement this hybrid child: it 
is a leaf, and all its parents have the same r- value as vq, which implies, arguing as in 
the previous case, that all siblings of v are leaves or hybrid nodes. Thus, v satisfies 
(c). 

We introduce now some reductions for TCTC-networks. Each of these reductions 
applied to a TCTC-network with n leaves and m internal nodes produces a TCTC- 
network with either n — 1 leaves and m internal nodes or with n leaves and m — 1 
internal nodes, and given any TCTC-network with more than two leaves, it will always 
be possible to apply to it some of these reductions. This lies at the basis of the proofs 
by algebraic induction of the main results in this paper. 

Let A^ be a TCTC-network with n ^ 3 leaves. 

(U) Let i be one tree leaf of N and assume that its parent has only this child. The U{i) 
reduction of N is the network Nfju-j obtained by removing the leaf i, together with its 
incoming arc, and labeling with i its former parent; cf. Fig. 2. This reduction removes 
the only child of a node, and thus it is clear that Nuu\ is still a TCTC-network, with 
the same number of leaves but one internal node less than N. 
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Fig. 2. The f/(i)-reduction. 



(T) Let i, j be two sibling tree leaves of A^ (that may, or may not, have other siblings). The 
T{i;j) reduction of N is the network -/Vj'(j.j) obtained by removing the leaf z, together 
with its incoming arc; cf. Fig. 3. This reduction procedure removes one tree leaf, but 
its parent u keeps at least another tree child, and if u was the root of N then it would 
not become elementary after the reduction, because n ^ 3 and therefore, since j is a 
leaf, u should have at least another child. Therefore, N^u-j) is a TCTC-network with 
the same number of internal nodes as N and n — 1 leaves. 

(H) Let i be a hybrid leaf of iV, let fi, . . . ,Vk, with k ^ 2, he its parents, and assume 
that each one of these parents has (at least) one tree leaf child: for every I = 1, . . . , fc, 
let ji be a tree leaf child of vi. The -fr(i; ji, . . . , j^) reduction of N is the network 
^H(i;ji,...,j^) obtained by removing the hybrid leaf i and its incoming arcs; cf. Fig. 4. 
This reduction procedure preserves the time consistency and the tree-child condition 
(it removes a hybrid leaf), and the root does not become elementary: indeed, the 
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Fig. 3. The T(i; j)-reduction. 



only possibility for the root to become elementary is to be one of the parents of i, 
which is impossible by Corollary 1. Therefore, Njju.j^^j^\ is a TCTC-network with 
the same number of internal nodes as N and n — 1 leaves. 
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Fig. 4. The _ff(i; ji, . . . , jfe)-reduction. 



We shall call the inverses of the U, T, and H reduction procedures, respectively, the 
U~"^, T~^, and H~^ expansions, and we shall denote them by U^^{i), R~^{i;j), and 
H~^{i;ji, . . . ,jk)- More specifically, for every TCTC-network N: 

— UN has some leaf labeled i, the expansion U~^{i) can be applied to N and the 
resulting network Nfj-iu) is obtained by unlabeling the leaf i and adding to it a tree 
leaf child labeled with i. Nu-iu) is always a TCTC-network. 

— If A^ has no leaf labeled with i and some tree leaf labeled with j, the expansion 
T~^{i;j) can be applied to N, and the resulting network Nrp-i(^i.j-j is obtained by 
adding to the parent of the leaf j an new tree leaf child labeled with i. Nj^-i^.j^ is 
always a TCTC-network. 

— If A^ has no leaf labeled with i and some tree leaves labeled with ji, . . . ,jk, k ^ 2, 
that are not sibling of each other, the expansion H~^{i;ji, . . . ,jk) can be applied to 
A'^ and the resulting network Nff-iu-jj^^,,,j^,) is obtained by adding a new hybrid node 
labeled with i and arcs from the parents of ji, . . . ,jk to i. A''j:^-i(jy^ j^) is always a 
tree child hybridization network, but it need not be time consistent, as the parents 
of j'l, . . . ,jk may have different temporal representations in N (for instance, one of 
them could be a tree descendant of another one). 

The following result is easily deduced from the explicit descriptions of the reduction 
and expansion procedures, and the fact that isomorphisms preserve labels and parents. 

Lemma 5. Let N and N' be two TCTC-networks. If N = N' , then the result of applying 
to both N and N' the same U reduction (respectively, T reduction, H reduction, U~ 
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expansion, T~ expansion, or H~ expansion) are again two isomorphic hybridization 
networks. 

Moreover, if we apply an U reduction (respectively, T reduction, or H reduction) to 
a TCTC-network N , and then we apply to the resulting TCTC-network the inverse U^ 
expansion (respectively, T~^ expansion, or H~^ expansion), we obtain a TCTC-network 
isomorphic to N . D 

As we said above, every TCTC-network with at least 3 leaves allows the application 
of some reduction. 

Proposition 3. Let N be a TCTC-network with more than two leaves. Then, at least 
one U, R, or H reduction can be applied to N . 

Proof. By Lemma 4, N contains either an internal (tree or hybrid) node v all whose 
children are tree leaves, or a hybrid leaf i all whose siblings are leaves or hybrid nodes. 
In the first case, we can apply to N either the reduction U{i) (if v has only one child, 
and it is the tree leaf i) or T{i] j) (if v has at least two tree leaf children, i and j). In the 
second case, let fi, . . . , Vfe, with /c ^ 2, be the parents of i. By the tree child condition, 
each vi, with I = 1, . . . ,k, has some tree child, and by the assumption on i, it will be a 
leaf, say j/. Then, we can apply to N the reduction H{i;ji, . . . ,jk). 

Therefore, every TCTC-network with n ^ 3 leaves and m internal nodes is obtained 
by the application of an U^ , T^ , or H^ expansion to a TCTC-network with either 
ra — 1 leaves or n leaves and m — 1 internal nodes. This allows the recursive construction 
of all TCTC-networks from TCTC-networks (actually, phylogenetic trees) with 2 leaves 
and 1 internal node. 

Example 1. Fig. 5 shows how a sequence of reductions transforms a certain TCTC- 
network N with 4 leaves into a phylogenetic tree with 2 leaves. The sequence of inverse 
expansions would then generate N from this phylogenetic tree. This sequence of expan- 
sions generating N is, of course, not unique. 



4 Path lengths vectors for fuUy resolved networks 

Let A^ be a hybridization network on 5" = {1, . . . , n}. For every pair of leaves i,j of A^, 
let ijyiijj) and iiy{j,i) be the distance from [i,j] to i and to j, respectively. 

Definition 3. The LCSA-path length between two leaves i and j in N is 

LwiiJ) =iN{i,j) +iNU,i)- 
The LCSA-path lengths vector of N is 

L(A) = (L^(.,J)),^,<,.^„GN«(-l)/^ 
with its entries ordered lexicographically in {i,j). 
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Fig. 5. A sequence of reductions. 

Notice that Liy{i,j) = Liy{j,i), for every pair of leaves i,j G S. 

If A^ is a phylogenetic tree, the LCSA-path length between two leaves is the path 
length between them as defined in §2.2, and therefore the vectors L{N) defined therein 
and here are the same. But, contrary to what happens in phylogenetic trees, the LCSA- 
path length between two leaves i and j in a hybridization network need not be the 
smallest sum of the distances from a common ancestor of i and j to these leaves (that 
is, the distance between these leaves in the undirected graph associated to the network). 

Example 2. Consider the TCTC-network N depicted in Fig. 6. Table 4 gives, in its 
upper triangle, the LCSA of every pair of different leaves, and in its lower triangle, the 
LCSA-path length between every pair of different leaves. 

Notice that, in this network, [3, 5] = r, because the root is the only common ancestor 
of 3 and 5 that is strict ancestor of some of them, and hence LAr(3, 5) = 8, but e is 
a common ancestor of both leaves and the length of both paths e ~^ 3 and e ~^ 5 is 3. 
Similarly, / is also a common ancestor of both leaves and the length of both paths f -^3 
and / -~^ 5 is 3. This is an example of LCSA-path length between two leaves that is 
largest than the smallest sum of the distances from a common ancestor of these leaves 
to each one of them. 

In a fully resolved phylogeny with reticulation events, every non extant species should 
have two direct descendants, and every reticulation event should involve two parent 
species, as such an event corresponds always to the exchange of genetic information 
between two parents: as Semple points out [23], hybrid nodes with in-degree greater than 
2 actually represent "an uncertainty of the exact order of 'hybridization'." Depending 
on whether we use hybridization or phylogenetic networks to model phylogenies, we 
distinguish between: 
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Fig. 6. The network A'^ in Example 2. 

Table 1. For every 1 ^ i < j ^ 6, the entry {i,j) of this table is [i,j], and the entry {j,i) is LN{i,j), 
with A'^ the network in Fig. 6. 





1 


2 


3 


4 


5 


6 


1 




e 


e 


r 


a 


r 


2 


4 




6 


r 


e 


r 


3 


5 


3 




c 


r 


f 


4 


6 


6 


3 




f 


f 


5 


3 


5 


8 


5 




d 


6 


6 


6 


5 


4 


3 





— Fully resolved hybridization networks: hybridization networks with ah their nodes of 
types (0,2), (1,0), (1,2), (2, 0), or (2, 2). 

— Fully resolved phylogenetic networks: phylogenetic networks with all their nodes of 
types (0,2), (1,0), (1, 2), or (2, 1). 

To simplify the language, we shall say that a hybridization network is quasi-binary when 
all its nodes are of types (0,2), (1,0), (1,2), (2,0), (2,1), or (2,2). These quasi-binary 
networks include as special cases the fully resolved hybridization and phylogenetic net- 
works. 

Our main result in this section establishes that the LCSA-path lengths vectors sep- 
arate fully resolved (hybridization or phylogenetic) TCTC-networks, thus generalizing 
Proposition 1 from trees to networks. To prove this result, we shall use the same strat- 
egy as the one developed in [4] or [6] to prove that the metrics introduced therein were 
indeed metrics: algebraic induction based on reductions. Now, we cannot use the reduc- 
tions defined in the last section as they stand, because they may generate elementary 
nodes that are forbidden in fully resolved networks. Instead, we shall use certain suitable 
combinations of them that always reduce in one the number of leaves. 
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So, consider the following reduction procedures for quasi-binary TCTC networks N 
with n leaves: 

(R) Let i,j be two sibling tree leaves of A^. The R{i;j) reduction of N is the quasi-binary 
TCTC-network Njy^i.j^ obtained by applying first the T{i;j) reduction to N and then 
the U{j) reduction to the resulting network. The final result is that the leaves i and 
j are removed, together with their incoming arcs, and then their former common 
parent, which now has become a leaf, is labeled with j; cf. Fig. 7. 








Fig. 7. The R{i\ j)-reduction. 

(Hq) Let i be a hybrid leaf, let vi and V2 be its parents and assume that the other children 
of these parents are tree leaves j'l and j2, respectively. The ifo(i;ji,j2) reduction 
of A'' is the quasi-binary TCTC-network ^Ho{i;ji,j2) obtained by applying first the 
reduction H{i;ji,J2) to N and then the reductions U{ji) and U{J2) to the resulting 
network. The overall effect is that the hybrid leaf i and the tree leaves ji,J2 are 
removed, together with their incoming arcs, and then the former parents fi,f2 of j'l 
and j2 are labeled with j'l and j2, respectively; cf. Fig. 8. 

Fig. 8. The //o(«;ii, J2)-reduction. 

(Hi) Let A be a hybrid node with only one child i, that is a tree node. Let vi and V2 
be the parents of A and assume that the other children of these parents are tree 
leaves j'l and j2, respectively. The Hi{i;ji,J2) reduction of N is the TCTC-network 
^Hi{i;ji,j2) obtained by applying first the reduction U{i) to A^, followed by the re- 
duction Ho{i;ji,J2) to the resulting network. The overall effect is that the leaf i, its 
parent A and the leaves ji,J2 are removed, together with their incoming arcs, and 
then the former parents fi,f2 of j'l and j2 are labeled with ji and j2, respectively; 
cf. Fig. 9. 

We use Hq and Hi instead of H and U because, for our purposes in this section, it has 
to be possible to decide whether or not we can apply a given reduction to a given fully 
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Fig. 9. The Hi{i; ji, J2)-reduction. 



resolved network N from the knowledge of L{N), and this cannot be done for the U 
reduction, while, as we shall see below, it is possible for Hq and Hi. 

Hq reductions cannot be applied to fully resolved phylogenetic networks (they don't 
have hybrid leaves) and Hi reductions cannot be applied to fully resolved hybridization 
networks (they don't have out-degree 1 hybrid nodes). The application of an R or an Hq 
reduction to a fully resolved TCTC hybridization network is again a fully resolved TCTC 
hybridization network, and the application of an R or an Hi reduction to a fully resolved 
TCTC phylogenetic network is again a fully resolved TCTC phylogenetic network. 

We shall call the inverses of the R, Hq and Hi reduction procedures, respectively, the 
R~^, H^ and H]~ expansions, and we shall denote them by i?~^(i;j), //]~ (i;ji, J2) and 
Hq {i',ji,J2)- More specifically, for every quasi-binary TCTC-network N with no leaf 
labeled i: 

— the expansion R^^{i; j) can be applied to N if it has a leaf labeled j, and the resulting 
network Nj^-iu-j) is obtained by unlabeling the leaf j and adding to it two leaf tree 
children labeled with i and j; 

— the expansion H^ {i',ji,J2) can be applied to N if it has a pair of leaves labeled 
j'l, j2, and the resulting network N^j-i.^- ■ •, is obtained by adding a new hybrid leaf 
labeled with i, and then, for each / = 1,2, unlabeling the leaf ji and adding to it a 
new tree leaf child labeled with ji and an arc to i. 

— the expansion H^ {i',ji,J2) can be applied to N if it has a pair of leaves labeled 
Ji) J2) and the resulting network N „-!,., . . n is obtained by adding a new node A, a 
tree leaf child i to it, and then, for each I = 1,2, unlabeling the leaf j; and adding to 
it a new tree leaf child labeled with ji and an arc to A. 

A R^ {i]j) expansion of a quasi-binary TCTC-network is always a quasi-binary TCTC- 
network, but an Hq {i',ji,J2) or an H^^ {'i',ji,J2) expansion of a quasi-binary TCTC- 
network, while still being always quasi-binary and tree child, needs not be time consistent: 
for instance, the leaves ji and J2 could be a hybrid leaf and a tree sibling of it. Moreover, 
we have the following result, which is a direct consequence of Lemma 5 and we state it 
for further reference. 

Lemma 6. Let N and N' be two quasi-binary TCTC-networks. If N = N' , then the 
result of applying to both N and N' the same R reduction (respectively, Hq reduction, Hi 
reduction, R~ expansion, Hq expansion, or H^ expansion) is again two isomorphic 
networks. 

15 



Moreover, if we apply an R reduction (respectively, Hq reduction or Hi reduction) to 
a quasi-binary TCTC-network N , and then we apply to the resulting network the inverse 
R^^ expansion (respectively, Hq expansion or H^ expansion), we obtain a quasi-binary 
TCTC-network isomorphic to N . D 

We have moreover the following result. 

Proposition 4. Let N be a quasi-binary TCTC-network with more than one leaf. Then, 
at least one R, Hq, or Hi reduction can be applied to N . 

Proof. If N contains some internal node with two tree leaf children i and j, then the 
reduction R{i;j) can be applied. If N does not contain any node with two tree leaf 
children, then, by Lemma 4, it contains a hybrid node v that is either a leaf (say, labeled 
with i) or it has only one child, which is a tree leaf (say, labeled with i), and such that 
all siblings of v are leaves or hybrid nodes. Now, the quasi-binarity of A'^ and the tree 
child condition entail that v has two parents, that each one of them has exactly one child 
other than v, and that this second child is a tree node. So, v has exactly two siblings, 
and they are tree leaves, say ji and J2- Then, the reduction Ho{i;ji,J2) (if f is a leaf) 
or Hi{i;ji,J2) (if v is not a leaf) can be applied. 

Corollary 2. (a) If N is a fully resolved TCTC hybridization network with more than 
one leaf, then, at least one R or Hq reduction can be applied to it. 

(b) If N is a fully resolved TCTC phylogenetic network with more than one leaf, then, 
at least one R or Hi reduction can be applied to it. D 

We shall prove now that the application conditions for the reductions introduced 
above can be read from the LCSA-path lengths vector of a fully resolved TCTC-network 
and that they modify in a specific way the LCSA-path lengths of the network which they 
are applied to. This will entail that if two fully resolved (hybridization or phylogenetic) 
TCTC-networks have the same LCSA-path lengths vectors, then the same reductions can 
be applied to both networks and the resulting fully resolved TCTC-networks still have 
the same LCSA-path lengths vectors. This will be the basis of the proof by induction on 
the number of leaves that two TCTC hybridization or phylogenetic networks with the 
same LCSA-path lengths vectors are always isomorphic. 

Lemma 7. Leti,j be two leaves of a quasi-binary TCTC-network N . Then, i and j are 
siblings if, and only if, Lfyf{i,j) = 2. 

Proof. If L]\f{i,j) = 2, then the paths [i,j] -^i and [i, j] ~^j have length 1, and therefore 
[i, j] is a parent of i and j. Conversely, if i and j are siblings and u is a parent in common 
of them, then, by the quasi-binarity of N ^ they are the only children of u, and by the 
tree-child condition, one of them, say i, is a tree node. But then, « is a strict ancestor of 
i, an ancestor of j, and no proper descendant of u is an ancestor of both i and j. This 
implies that u = [i,j] and hence that L]^(i,j) = 2. 

Lemma 8. Let N be a quasi-binary TCTC-network on a set S of taxa. 
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(1) The reduction R{i;j) can be applied to N if, and only if, Liy{i,j) = 2 and, for every 
k € S\{i,j}, LN{i,k) = LN{j,k). 

(2) If the reduction R{i;j) can be applied to N, then 

^Na(^,j) (i> k) = LnU, k) -1 for every k £ S\{i, j} 
^Na(,,j){k,l) = LN{k,l) for every k,l G S\{i,j} 

Proof. As far as (1) goes, R{i;j) can be applied to N if, and only if, the leaves i and j 
are siblings and of tree type. Now, if i and j are two tree sibling leaves and u is their 
parent, then on the one hand, L]\f{i,j) = 2 by Lemma 7, and on the other hand, since, 
by Lemma 2, [i, k] = [u, k] = [j, k] for every leaf k ^ i,j, we have that 

^Af (i, k) = InU, k) = 1 + distance from [u, k] to u 
ll\f{k,i) = i]\f{k,j) = distance from [u, k] to k 

and therefore Liy{i, k) = Ln(j, k) for every k £ S \ {i,j}- 

Conversely, assume that Lj^(i,j) = 2 and that L]\f{i,k) = Lj\f{j,k) for every k G 
S\{i,j}. The fact that Liy{i,j) = 2 implies that i and j share a parent u. If one of these 
leaves, say i, is hybrid, then the tree child condition implies that the other, j, is of tree 
type. Let now f be the other parent of i and k a tree descendant leaf of v, and let h be 
the length of the unique path v-^k. Then f is a strict ancestor of k and an ancestor of 
i, and no proper tree descendant of v can possibly be an ancestor of i: otherwise, there 
would exist a path from a proper tree descendant of v to u, and then the time consistency 
property would forbid u and v to have a hybrid child in common. Therefore v = [i,k] 
and LN{i, k) = h + l. Now, the only possibility for the equality L^ij, k) = h + 1 to hold 
is that some intermediate node in the path v-^k is an ancestor of the only parent u of 
j, which, as we have just seen, is impossible. This leads to a contradiction, which shows 
that i and j are both tree sibling leaves. This finishes the proof of (1). 

As far as (2) goes, in Nj^u-j) we remove the leaf i and we replace the leaf j by its parent. 
By Lemma 2, this does not modify the LCSA [j, k] of j and any other remaining leaf k, 
and since we have shortened in 1 any path ending in j, we deduce that L^^.^, (j, k) = 
LnUi k) — 1 for every k £ S \ {i,j}- On the other hand, for every k,l £ S \ {i,j}, the 
reduction R{i;j) has affected neither the LCSA [k,l] of k and /, nor the paths [A;,/]~^/c 
or [/c,/]~^/, which implies that Lj\f {k,l) = L]\f{k,l) 

Lemma 9. Let N be a fully resolved TCTC hybridization network on a set S of taxa. 

(1) The reduction HQ{i;ji,J2) can be applied to N if, and only if, Lpf{i,ji) = LAr(i,j2) = 
2. 

(2) If the reduction HQ{i;ji,J2) can be applied to N, then 

^NH^,(^■,J^,J^)ih,j2) = Ln {31,32) - 2 

^^H^,(^■,J^,j^)ih,k) = LnUi, k) -1 for every k £ S\ {i,ji,J2} 
-^^Hofei.i2)(-^2, k) = Ln{J2, k) -1 for every k £ S\ {i,ji,J2} 
-^^ffo(«;ii.i2)(^'') = LN{k,l) for every k,l £ S\{i,ji,J2} 
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Proof. As far as (1) goes, the reduction Ho{i;ji,J2) can be applied to N if, and only 
if, i is a hybrid sibling of the tree leaves j'l and J2- If this last condition happens, then 
LNihJi) = 2 and LN{i,J2) = 2 by Lemma 7. Conversely, LN{i,ji) = Ljv(i,i2) = 2 
implies that i, j'l and i, j2 are pairs of sibling leaves. Since no node of N can have more 
than 2 children, and at least one of its children must be of tree type, this implies that i 
is a hybrid node (with two different parents), and ji and J2 are tree nodes. 

As far as (2) goes, the tree leaves ji and J2 are replaced by their parents. By Lemma 
7, this does not affect any LCSA and it only shortens in 1 the paths ending in ji or J2. 
Thus, the HQ{i;ji,J2) reduction does not affect the LCSA-path length between any pair 
of remaining leaves other than ji and J2, it shortens in 1 the LCSA-path length between 
Ji or J2 and any remaining leaf other than ji or J2, and it shortens in 2 the LCSA-path 
length between ji and J2- 

Lemma 10. Let N be a fully resolved TCTC phylogenetic network on a set S of taxa. 

(1) The reduction Hi{i;ji,J2) can be applied to N if, and only if, 

- LN{i,ji) = LN{i,J2) = 3, 

- Ln{Ji,J2) ^ 4, 

- if Ln{Ji,J2) = 4, then LN{ji,k) = LN{J2,k) for every k € S\{ji,J2,'i}- 

(2) If the reduction Hi{i;ji,J2) can be applied to N, then 

'^^ffi(i;Ji.i2)(-^'l'-^2) = Ln{Ji,J2) - 2 

^iv^^(i.,^_,.2) {ji,k) = LN{ji,k) - 1 for every ke S\ {i,ji,J2} 
^^H^(i;h>J2) (-^2' ^) = Ln{J2, k)-l for every ke S\ {i,ii, J2} 
^^ffi(i;ii.i2)*^^'^) " -^Af(A:,0 for every k € S\{i,ji,J2} 

Proof. As far as (1) goes, the reduction Hi{i;ji,J2) can be applied to N if, and only if, 
ji and J2 are tree leaves that are not siblings and they share a sibling hybrid node that 
has the tree leaf i as its only child. Now, if this application condition for Hi{i;ji,J2) is 
satisfied, then L]\f{i,ji) = 3, because the parent of ji is an ancestor of i, a strict ancestor 
of j'l, and clearly no proper descendant of it is an ancestor of i and ji; by a similar 
reason, L]\f{i,J2) = 3. Moreover, since ji and J2 are not sibling, Ljsi{ji,32) ^ 3. But if 
Ln{Ji^J2) = 3, then there would exist an arc from the parent of j'l to the parent of j2, 
or vice versa, which would entail a node of out-degree 3 that cannot exist in the fully 
resolved network A*". Therefore, Ln{ji,J2) ^ 4. Finally, if -^Ar(ji,i2) = 4, this means that 
the parents x and y of ji and J2 (that are tree nodes, because they have out-degree 2 and 
A^ is a phylogenetic network) are sibling: let u be their parent in common. In this case, 
no leaf other than ji,J2, i is a descendant of n, and therefore, for every k G S\ {ji,J2,i}, 

[ji, k] = [x, k] = [u, k] = [y, k] = [j2, k] 

by Lemma 4, and thus 

^n{Ji, k) = ^Af(j2, fc) = 2 + distance from [n, k] to u 
^N{k^3i) = (Nik,J2) = distance from [u,k] to k, 
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which implies that L]\f(ji, k) = Ljv(J2, k). 

Conversely, assume that L]\f{i,ji) = Liy{i,J2) = 3, that Ln{ji,J2) ^ 4, and that if 
LNU1J2) = 4, then LM{ii,k) = LM{i2,k) for every k £ S\ {jij2,i}- Let x, y and 
z be the parents of j'l, J2 and i, respectively. Notice that these parents are pairwise 
different (otherwise, the LCSA-path length between a pair among ji,J2,i would be 2). 
Moreover, since N is a phylogenetic network, ji, J2 and i are tree nodes. Then, Ljv(i, ji) = 
L]\f{i,J2) = 3 implies that there must exist an arc between the nodes x and z and an arc 
between the nodes y and z. 

Now, if these arcs are {z,x) and {z,y), the node z would have out-degree 3, which 
is impossible. Assume now that (x, z) and {z, y) are arcs of A^. In this case, both z and 
X have out-degree 2, which implies (recall that A is a phylogenetic network) that they 
are tree nodes. Then, x = [ji-,32] (it is an ancestor of j2, a strict ancestor of j'l, and no 
proper descendant of it is an ancestor of ji and J2) and therefore Ln{Ji-,J2) = 4. In this 
case, we assume that L]\f{ji,k) = Lj\f{J2,k) for every k £ S \ {ji,J2,'>'}- Now we must 
distinguish two cases, depending on the type of node y: 

— If y is a tree node, let p be its child other than J2, and let /c be a tree descendant leaf 
of p. In this case, [ji,k] = x and [J2,k] = y (by the same reason why x is [ji, J2]), 
and hence L]\f{ji, k) = Lj\f{J2, k) + 2, against the assumption Lj\[{ji, k) = L]\f{J2, k). 

— If y is a hybrid node, let p be its parent other than z, and let A; be a tree descendant 
leaf oi p {k ^ J2, because J2 is not a tree descendant of p). In this case, [J2, k] = p 
(because p is an ancestor of J2 and a strict ancestor of k, and the time consistency 
property implies that no intermediate node in the path p^^k can be an ancestor of 
y). Now, if the length of the (only) path p -^ k is h, then Ln{J2, k) = h + 2, and 
for the equality Ljv(ii, k) = h + 2 to hold, either the arc {x,p) belongs to A, which 
is impossible because x would have out-degree 3, or a node in the path p-^ k is an 
ancestor of x, which is impossible because of the time consistency property. 

In both cases we reach a contradiction that implies that the arcs (x, z), {z, y) do not exist 
in A. By symmetry, the arcs {y,z), {z,x) do not exist in A, either. Therefore, the only 
possibility is that A contains the arcs (x, z), {y, z), that is, that z is hybrid child of the 
nodes x and y. This finishes the proof of (1). 

As far as (2) goes, it is proved as in Lemma 9. 

Now we can prove the main results in this section. 

Proposition 5. Let A and N' be two fully resolved TCTC hybridization networks on 
the same set S of taxa. Then, L{N) = L{N') if, and only if, A = A'. 

Proof. The 'if implication is obvious. We prove the 'only if implication by induction on 
the number n of elements of S. 

The cases n = 1 and n = 2 are straightforward, because there exist only one TCTC- 
network on S" = {1} and one TCTC-network on S" = {1, 2}: the one-node graph and the 
phylogenetic tree with leaves 1,2, respectively. 
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Assume now that the thesis is true for fuhy resolved TCTC hybridization networks 
with n leaves, and let A^ and N' be two fully resolved TCTC hybridization networks on 
the same set S" of n + 1 labels such that L{N) = L(N'). By Corollary 2. (a), an R{i; j) or 
a HQ{i;ji,J2) can be applied to N. Moreover, since the possibility of applying one such 
reduction depends on the LCSA-path lengths vector by Lemmas 8.(1) and 9.(1), and 
L{N) = L{N'), it will be possible to apply the same reduction to N'. So, let Ni and A^{ 
be the fully resolved TCTC hybridization networks obtained by applying the same R or 
Ho reduction to N and N' . 

From Lemmas 8.(2) and 9.(2) we deduce that L{Ni) = L{N[) and hence, by the 
induction hypothesis, Ni = N[. Finally, if we apply to Ni and A( the R~^ or H^^ 
expansion that is inverse to the reduction applied to N and A^', then, by Lemma 6, we 
obtain again A and A' and they are isomorphic. 

A similar argument, using Lemmas 8 and 10, proves the following result. 

Proposition 6. Let N and N' be two fully resolved TCTC phylogenetic networks on the 
same set S of taxa. Then, L[N) = L{N') if, and only if, A = A'. D 

Remark 2. The LCSA-path lengths vectors do not separate quasi-binary TCTC-networks. 
Indeed, consider the TCTC-networks A, A' depicted in Fig. 10. They are quasi-binary 
(but neither fully resolved phylogenetic networks nor fully resolved hybridization net- 
works), and a simple computation shows that 

L(A) = L{N') = (3,6,3,3,6,3). 

The network A in Fig. 10 also shows that Lemma 10.(1) is false for quasi-binary hy- 
bridization networks. 





N 



N' 



Fig. 10. These two quasi-binary TCTC-networks iiave tlie same LCSA-patli length vectors. 
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Let FRH„ (respectively, FRP„) denote the classes of fully resolved TCTC hybridiza- 
tion (respectively, phylogenetic) networks on S* = {1, . . . ,n}. We have just proved that 
the mappings 

L : FRH„ ^ IR"("-i)/2, L : FRP„ ^ M«(«-i)/2 

are injective, and therefore they can be used to induce metrics on FRH„ and FRP„ from 
metrics on M"("-i)/2. 

Proposition 7. For every n ^ 1, let D be any metric on ]R"("~i)/2. Xhe mappings d : 
FRH„ X FRH„ ^Randd: FRP„ x FRP„ -^ R defined by d{Ni,N2) = D{L{Ni),L{N2)) 
satisfy the axioms of metrics up to isomorphisms: 

(1) d(iVi,iV2)^0, 

(2) d{Ni,N2) = if and only if Ni^N2, 

(3) diNi,N2)=diN2,Ni), 

(4) d{Ni,N3) ^ d{Ni,N2) + d{N2,N3). 

Proof Properties (1), (3) and (4) are direct consequences of the corresponding properties 
of D, while property (2) follows from the separation axiom for D (which says that 
D(Mi, M2) = if, and only if. Mi = M2) and Proposition 5 or 6, depending on the case. 

For instance, using as D the Manhattan distance on M"("^i)/2^ we obtain the metric 
on FRH„ or FRP^ 

di{Ni,N2)= Yl \LNAiJ)-LN2ihJ)\, 
and using as D the Euclidean distance we obtain the metric 



d2{Ni,N2)= Y. {LNAhJ)-LN,{h3)?- 

These metrics generalize to fully resolved TCTC (hybridization or phylogenetic) networks 
the classical distances for fully resolved phylogenetic trees introduced by Farris [11] and 
Clifford [29] around 1970. 

5 Splitted path lengths vectors for arbitrary networks 

As we have seen in §2.2 and Remark 2, the path lengths vectors do not separate arbitrary 
TCTC-networks. Since to separate arbitrary phylogenetic trees we splitted the path 
lengths (Definition 2), we shall use the same strategy in the networks setting. In this 
connection, we already proved in [6] that the matrix 

£(Af) = (^iv(i,i)).= l,....n 

J — 1 , . . . , n 

separates TCTC phylogenetic networks on S* = {1, . . . ,n} with tree nodes of arbitrary 
out-degree and hybrid nodes of arbitrary in-degree. But it is not true for TCTC hy- 
bridization networks, as the following example shows. 

21 




N 



N' 



Fig. 11. These two hybridization TCTC-notworks are such that £N{i,j) ~ ^N'ihj), for every pair of 
leaves i,j. 



Example 3. Consider the pair of non-isomorphic TCTC-networks N and N' depicted in 
Fig. 11. A simple computation shows that 



i{N) = e{N') 



I 



\ 



12212\ 

101122 

210222 

212012 

122101 

2222 10/ 



So, in order to separate arbitrary TCTC-networks we need to add some extra infor- 
mation to the distances i]\f{i,j) from LCSAs to leaves. The extra information we shah 
use is whether the LCSA of each pair of leaves is a strict ancestor of one leaf or the other 
(or both). So, for every pair of different leaves i,j of N, let hpf{i,j) be —1 if [i,j] is a 
strict ancestor of i but not of j, 1 if [i,j] is a strict ancestor of j but not of i, and if 
[i,j] is a strict ancestor of both i and j. Notice that hN{j,i) = —hN{i,j)- 

Definition 4. Let N be a hybridization network on the set S = {1, . . . ,n}. 

For every i,j G S, the splitted LCSA-path length from i to j is the ordered 3-tuple 

LNihJ) = {^N{i,j)jN{j,i),hN{i,j))- 

The splitted LCSA-path lengths vector of N is 

L%N) = (L^(^,i))i^,<,.^„ G (N X N X {-l,0,l})"("-i)/2 

with its entries ordered lexicographically in {i,j). 

Example 4- Consider the quasi-binary TCTC-networks N and N' depicted in Fig. 10. 

Then 

L%N)={{2, 1, -1), (3, 3, 0), (1, 2, -1), (1, 2, 1), (2, 4, 0), (1, 2, -1)) 
L-(iV')=((l, 2, 1), (2, 4, 0), (1, 2, 1), (1, 2, -1), (3, 3, 0), (2, 1, 1)) 
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Example 5. Consider the TCTC-networks A^ and N' depicted in Fig. 11. Then 

L^(iV)=((l, 1,-1), (2, 2,0), (2, 2,0), (1,1,-1), (2, 2,0), (1,1,1), (1,1,1), 

(2,2,0), (2,2,0), (2,2,0), (2, 2,0), (2, 2,0), (1,1,-1), (2,2,0), (1,1,1)) 

L-(iV')=((l, 1,1), (2,2,0), (2,2,0), (1,1,1), (2,2,0), (1,1,-1), (1,1,-1), 

(2, 2,0), (2, 2,0), (2, 2,0), (2, 2,0), (2, 2,0), (1,1,1), (2, 2,0), (1,1,-1)) 

Remark 3. If A^ is a phylogenetic tree on S", then hi\f{i,j) = for every i,j € S. 

We shaU prove now that these sphtted LCSA-path lengths vectors separate arbitrary 
hybridization TCTC-networks. The master plan for proving it is similar to the one used 
in the proof of Proposition 5: induction based on the fact that the application conditions 
for the reductions introduced in Section 3 can be read in the splitted LCSA-path lengths 
vectors of TCTC-networks and that these reductions modify in a controlled way these 
vectors. 

Lemma 11. Let N be a TCTC-network on a set S of taxa. 

(1) The reduction U{i) can he applied to N if, and only if, ii^{i,j) ^ 2 for every j G 
S\{^}. 

(2) If the reduction U{i) can he applied to N , then 

LNjj(,){i,3) = LN{i,j) - (1,0,0) for every j e S\{i} 
Lnu(,) U, k) = Ln{j, k) for every j,k e S\ {i} 

Proof. As far as (1) goes, the reduction U{i) can be applied to A'' if, and only if, the leaf 
i is a tree node and the only child of its parent. Let us check now that this last condition 
is equivalent to iiy{i,j) ^ 2 for every j G S \ {i}. To do this, we distinguish three cases: 

— Assume that i is a tree node and the only child of its parent x. Then, for every 
j £ S\ {i}, the LCSA of i and j is a proper ancestor of x, and therefore i]\f{i,j) ^ 2. 

— Assume that i is a tree node and that it has a sibling y. Let x be the parent of i 
and y and let j be a tree descendant leaf of y. Then [i,j] = x, because x is a strict 
ancestor of i, an ancestor of j and clearly no descendant of x is an ancestor of both 
i and j. Therefore, in this case, i]\f{i,j) = 1 for this leaf j. 

— Assume that i is a hybrid node. Let x be any parent of i and let j be a tree descendant 
of X. Then, [i,j] = x, because x is a strict ancestor of j, an ancestor of i, and no 
intermediate node in the unique path x ~^ j is an ancestor of i (it would violate the 
time consistency property). Therefore, in this case, ii\f{i,j) = 1 for this leaf j, too. 

Since these three cases cover all possibilities, we conclude that i is a tree node without 
siblings if, and only if, iNihj) ^ 2 for every j £ S \ {i}. This finishes the proof of (1). 

As far as (2) goes, in A^[7(j) we replace the tree leaf i by its parent. By Lemma 2, this 
does not modify any LCSA, and it only shortens in 1 any path ending in i. Therefore 

^Nu(^i){h3) = ^N{i,j) - 1, eNu^,){3,i) = ^N{j,i) for every j G S* \ {i} 
^Nu(i) U, k) = InU, k), lNu(i) {k, j) = ^N{k, j) for every j,k e S\ {i} 
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As far as the h component of the sphtted LCSA-path lengths goes, notice that a node 
M is a strict ancestor of a tree leaf i if, and only if, it is a strict ancestor of its parent 
X (because every path ending in i contains x). Therefore, an internal node of Njju^ is a 
strict ancestor of the leaf i in A^[/(j) if, and only if, it is a strict ancestor of the leaf i in 
N. On the other hand, replacing a tree leaf without siblings by its only parent does not 
affect any path ending in another leaf, and therefore an internal node of Njju^ is a strict 
ancestor of a leaf j ^ i m A^[/(j) if, and only if, it is a strict ancestor of the leaf j in A^. 
So, by Lemma 2, the LCSA of a pair of leaves in A^ and in A^(7(j) is the same, and we 
have just proved that this LCSA is a strict ancestor of exactly the same leaves in both 
networks: this implies that 

^Nud) ih J) = hN{i, j) for every j e S\ {i} 
^Nu(^) (i) k) = ^nU, k) for every j,k e S\ {i} 

Lemma 12. Let N be a TCTC-network on a set S of taxa. 

(1) The reduction T{i;j) can be applied to N if, and only if, Lj^{i,j) = (1, 1,0). 

(2) If the reduction T(i;j) can be applied to N, then 

^^T(.,) (^' = ^Nik, I) for every k,l£S\{i} 

Proof. As far as (1) goes, T{i; j) can be applied to A^ if, and only if, the leaves i and j are 
tree nodes and sibling. Let us prove that this last condition is equivalent to iNihJ) = 
^N{j-,i) = 1 and hN{i-,j) = 0. Indeed, if the leaves i and j are tree nodes and sibling, 
then their parent is their LCSA and moreover it is a strict ancestor of both of them, 
which implies that i]\f{i,j) = InU^''-) — 1 and h]\f{i,j) = 0. Conversely, assume that 
^N{hj) = ^N[j-,i) = 1 and hiy{i,j) = 0. The equalities i^iiij) = ^N{j-,i) = 1 imply that 
[i,j] is a parent of i and j, and hN{i,j) = implies that this parent of i and j is a strict 
ancestor of both of them, and therefore, by Lemma 3, that i and j are tree nodes. This 
finishes the proof of (1). 

As far as (2) goes, in Nj-u-j^ we simply remove the leaf i without removing anything 
else. Therefore, no path ending in a remaining leaf is affected, and as a consequence no 
L^{k,l) with k,l j^ i, is modified. 

Lemma 13. Let N be a TCTC-network on a set S of taxa. 

(1) The reduction H(i;ji, . . . ,jk) can be applied to N if, and only if, 

- L%{i,ji) = (1,1,1), for every I e {l,...,k}. 

- ^NiJaJb) >2 or iNiJbJa) > 2 for every a, 6 € {1, ... , k}. 

- For every s ^ {ji,---,jk}, if ^N{i,s) = 1 and hN{i-,s) = I, then iNiJi,s) = 1 
and h^lji, s) = for some / € {1, . . . , k}. 

(2) If the reduction H{i;ji, . . . ,jk) can be applied to N , then 

Lnhu.u ....„ ) (■«> = Ln{s, t) for every s,te S\{i} 



VH(»;Ji,...jfe) 
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Proof. As far as (1) goes, H{i;ji, . . . ,ji^) can be applied to N if, and only if, ji, ■ ■ ■ ,jk 
are tree leaves that are not sibling of each other, the leaf i is a hybrid sibling of ji, . . . , j'^, 
and the only parents of i are those of j'l, . . . , j^. Now: 

— For each I = 1, . . . ,k, the condition Lf^{i,ji) = (1, 1, 1) says that i and j; are sibling, 
and that their parent in common is a strict ancestor of j/ but not of i. Using Lemma 
3, we conclude that this condition is equivalent to the fact that i and ji are sibling, 
jl is a tree node, and i a hybrid node. 

— Assume that j'l, . . . ,jk are tree leaves, with parents vi, . . . ,Vk, respectively. In this 
case, the condition iN{ja-,Jb) ^ 2 or (■N{jb-,ja) ^ 2 is equivalent to the fact that 
ja,jb are not sibling. Indeed, if ja and ji, are sibling, then iNiJaJb) = ^NUbJa) = 1- 
Conversely, if ja and jb are not sibling, then there are two possibilities: either Va is 
an ancestor oi jb, but not its parent, in which case Va = \ja,jb] and (-N{jb-,ja) ^ 2, or 
Va is not an ancestor of j^, in which case [ja,jb\ is a proper ancestor of Va and hence 

(■N{ja,jb) ^ 2. 

— Assume that ji, . . . , jfc are tree leaves, with parents fi, . . . , Vfc, respectively, and that 
i is a hybrid sibling of them. Let us see that the only parents of i are vi,. . . ,Vk 
if, and only if, for every s ^ {j'l, . . . ,jk}-, ^Nihs) = 1 and h]\f{i,s) = 1 imply that 
InUi, s) = 1 and h^iji, s) = for some I = 1, . . . ,k. 

Indeed, assume that the only parents of i are vi, . . . ,Vk, and let s ^ {ji, . . . , j^} be a 
leaf such that ^Ar(i, s) = 1 and /iAr(z, s) = 1. Since ^Ar(i, s) = 1, some parent of i, say 
vi, is the LCSA of i and s, and h^li, s) = 1 implies that vi is a strict ancestor of s. 
But then vi will be the LCSA of its tree leaf ji and s and strict ancestor of both of 
them, and thus InU^ s) = 1 and h]\f{jh s) = 0- 

Conversely, assume that, for every s ^ {ji, ■ ■ ■ ,jk}, (-Nihs) = 1 and h]\f{i,s) = 1 
imply that InUi, s) = 1 and h^iju -s) = for some / = 1, . . . , fc. Let f be a parent of 
i, and let s be a tree descendant leaf of v. Then, v = [i, s] {v is a strict ancestor of s, 
an ancestor of i, and no intermediate node in the unique path v-^ s is an ancestor 
of i, by the time consistency property) and thus ^Ar(i, s) = 1; moreover, hj\fii, s) = 1 
by Lemma 3. Now, if s = ji, for some / = 1, . . . , fc, then v = vi. On the other hand, if 
s ^ {jl, ■ ■ ■ ,jk}, then by assumption, there will exist some ji such that (nUi, s) = 1 
and h]s[{ji,s) = 0, that is, such that vi is a strict ancestor of s. This implies that 
V = vi- Indeed, if f ^ vi, then either vi is an intermediate node in the path v -^ s, 
and in particular a tree descendant of v, which is forbidden by the time consistency 
because v and vi have the hybrid child i in common, or t; is a proper descendant 
of vi through a path where vi and all the intermediate nodes are hybrid (if some of 
these nodes were of tree type, the temporal representation of v would be greater than 
that of vi, contradicting again the time consistency), in which case the child of vi in 
this path would be a hybrid child of vi that is a strict descendant of it (because it 
is intermediate in the path vi -^ v -^ s and s is a strict descendant of vi), which is 
impossible by Lemma 3. 

This finishes the proof of (1). 
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As far as (2) goes, in Njjr^.j^^j\ we simply remove the hybrid leaf i without removing 
anything else, and therefore no splitted LCSA-path length of a pair of remaining leaves 
is affected. 

Theorem 1. Let N and N' be two TCTC-networks on the same set S of taxa. Then, 
L'{N) = L'{N') if, and only if, N ^ N' . 

Proof. The 'if implication is obvious. We prove the 'only if implication by double in- 
duction on the number n of elements of S and the number m of internal nodes of A^. 

As in Proposition 5, the cases n = 1 and n = 2 are straightforward, because both 
TCTCi and TCTC2 consist of a single network. 

On the other hand, the case when m = 1, for every n, is also straightforward: assuming 
S = {1, . . . ,n}, the network A^ is in this case the phylogenetic tree with Newick string 
(1,2, . . . ,n) ; , consisting only of the root and the leaves, and in particular Lj^(i,j) = 
(1, 1,0) for every 1 ^ i < j ^ n. If L^N) = L%N'), we have that L%,{i,i) = (1, 1,0) 
for every 1 ^ i < j ^ n, and therefore all leaves in N' are tree nodes and sibling of each 
other by Lemma 3. Since the root of a hybridization network cannot be elementary, this 
says that N' is also a phylogenetic tree with Newick string (1,2, . . . ,n) ; and hence it 
is isomorphic to N . 

Let now N and N' two TCTC-networks with n ^ 3 leaves such that L%N) = U{N') 
and N has m ^ 2 internal nodes. Assume as induction hypothesis that the thesis in the 
theorem is true for pairs of TCTC-networks Ai, A( with n — 1 leaves or with n leaves 
and such that Ai has m — 1 internal nodes. 

By Proposition 3, a reduction U{i), T{i;j) or H{i;ji, . . . , j^) can be applied to A. 
Since the application conditions for such a reduction depend only on the splitted LCSA- 
path lengths vectors by Lemmas 11.(1), 12.(1) and 13.(1), and L^(A) = L*(A'), we 
conclude that we can apply the same reduction to A'. 

Now, we apply the same reduction to A and A' to obtain new TCTC-networks Ai 
and A^{, respectively. If the reduction was of the form U{i), Ni and A( have n leaves and 
Ai has 771 — 1 internal nodes; if the reduction was of the forms T{i;j) or H{i;ji, . . . ,jk), 
Ai and A{ have n - 1 leaves. In aU cases, L''(Ai) = L^iN[) by Lemmas 11.(2), 12.(2) 
and 13.(2), and therefore, by the induction hypothesis, Ai = A(. 

Finally, by Lemma 5, A and A^' are obtained from Ai and A( by applying the same 
expansion U~^, T~^, or H"^, and they are isomorphic. 

The vectors of splitted LCSA-path lengths vectors do not separate hybridization 
networks much more general than the TCTC, as we following examples show. 

Remark 4- The vectors of splitted distances do not separate arbitrary (that, is, possibly 
time inconsistent) tree-child phylogenetic networks. Indeed, the non-isomorphic tree-child 
binary phylogenetic networks A and A' depicted in Fig. 12 have the same L* vectors: 

L^(A) = L^(A') = ((2,1,1),(4,1,1),(3,1,1)). 
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Fig. 12. These two tree-child binary phylogenetic networks have the same splitted LCSA-path lengths 
vectors. 



Remark 5. The splitted LCSA-path lengths vectors do not separate tree-sibling time 
consistent phylogenetic networks, either. Consider for instance the tree-sibling time con- 
sistent fully resolved phylogenetic networks A^ and N' depicted in Figure 13. A simple 
computation shows that they have the same L* vectors, but they are not isomorphic. 



As in the fully resolved case, the injectivity of the mapping 



L' : TCTCr, -^ M3n(n-l)/2 



makes it possible to induce metrics on TCTC„ from metrics on M'^"'-" '^)/'^ _ xhe proof of 
the following result is similar to that of Proposition 7. 



Proposition 8. For every n ^ 1, let D be any metric on M3"(" i)/^. The mapping 
d' : TCTC„ X TCTC„ -^ R defined by d{Ni,N2) = D{L''{Ni),L%N2)) satisfies the 
axioms of metrics up to isomorphisms. D 
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Fig. 13. These two tree-sibling time consistent binary phylogenetic networks have the same splitted 
LCSA-path lengths vectors. 



For instance, using as D the Manhattan distance or the Euchdean distance, we obtain, 
respectively, the metrics on TCTC„ 

dl {Ni,N2)= Yl (l^^i (^' •?■) - ^^2 {i, J) I + l^N, (i, i) - iN, (i, i) \ 

= X] {\^Ni{i,j) -(N2{i,J)\ + ^\hNi{i,j) -hN2{iJ)\) 



i^i<j^n 



+ ihNAh3) -hNiihJ)) 



2\\ 2 
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These metrics generalize to TCTC-networks the sphtted nodal metrics for arbitrary 
phylogenetic trees defined in [7]. and the nodal metric for TCTC phylogenetic networks 
defined in [6]. 

6 Conclusions 

A classical result of Smolenskii [24] establishes that the vectors of distances between 
pairs of leaves separate unrooted phylogenetic trees on a given set of taxa. This result 
generalizes easily to fully resolved rooted phylogenetic trees [7] , and it lies at the basis of 
the classical definitions of nodal distances for unrooted as well as for fully resolved rooted 
phylogenetic trees based on the comparison of these vectors [3,11,12,22,26,29]. But these 
vectors do not separate arbitrary rooted phylogenetic trees, and therefore they cannot 
be used to compare the latter in a sound way. This problem was overcome in [7] by 
introducing the splitted path lengths matrices and showing that they separate arbitrary 
rooted phylogenetic trees on a given set of taxa. It is possible then to define splitted nodal 
metrics for arbitrary rooted phylogenetic trees by comparing these matrices. 

In this paper we have generalized these results to the class TCTC„ of tree-child time 
consistent hybridization networks (TCTC-networks) with n leaves. For every pair i,j of 
leaves in a TCTC-network A^, we have defined the LCSA-path length Liy{i,j) and the 
splitted LCSA-path length L^j^{i,j) between i and j and we have proved that the vectors 
L{N) = {Liy{i,j))i<^i<^j^n separate fully resolved networks in TCTC„ and the vectors 
L*(iV) = {Lf^{i,j))i^i^j^n separate arbitrary TCTC-networks. 

The vectors L(N) and L^{N) can be computed in low polynomial time by means of 
simple algorithms that do not require the use of sophisticated data structures. Indeed, 
let n be the number of leaves and m the number of internal nodes in N. As we explained 
in [5, §V.D], for each internal node v and for each leaf i, it can be decided whether v 
is a strict or a non-strict ancestor of i, or not an ancestor of it at all, by computing 
by breadth-first search the shortest paths from the root to each leaf before and after 
removing each of the m nodes in turn, because a non-strict descendant of a node will 
still be reachable from the root after removing that node, while a strict descendant will 
not. All this information can be computed in 0{m{n + m)) time, and once it has been 
computed the least common semi-strict ancestor of two leaves can be computed in 0{m) 
time by selecting the node of least height among those which are ancestors of the two 
leaves and strict ancestors of at least one of them. This allows the computation of L{N) 
and L^{N) in 0{rn? + n^m) time. 

These vectors L{N) and L^{N) can be used then to define metrics for fully resolved 
and arbitrary TCTC-networks, respectively, from metrics for real-valued vectors. The 
metrics obtained in this way can be understood as generalizations to TCTC^ of the 
(non-splitted or splitted) nodal metrics for phylogenetic trees and they can be computed 
in low polynomial time if the metric used to compare the vectors can be done so: this is 
the case, for instance, when this metric is the Manhattan or the Euclidean metric (in the 
last case, computing the square root with 0(10™"^") significant digits [2], which should 
be more than enough). 
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It remains to study the main properties of the metrics defined in this way, hke for 
instance their diameter or the distribution of their values. It is important to recall here 
that these are open problems even for the classical nodal distances for fully resolved 
rooted phylogenetic trees. 

Acknowledgment 

The research reported in this paper has been partially supported by the Spanish DGI 
projects MTM2006-07773 COMGRIO and MTM2006-15038-C02-01. 

References 

1. M. Baroni, C. Semple, M. Steel, Hybrids in real time, Syst. Biol. 55 (2006) 46-56. 

2. P. Batra, Newton's method and the computational complexity of the fundamental theorem of algebra. 
Electron. Notes Theor. Comput. Sci. 202 (2008) 201-218. 

3. J. Bluis, D.-G. Shin, Nodal distance algorithm: Calculating a phylogenetic tree comparison metric, 
in: Proc. 3rd IEEE Symp. Biolnformatics and BioEngineering, 2003. 

4. G. Cardona, M. Llabres, F. Rossello, G. Valiente, A distance metric for a class of tree-sibling phy- 
logenetic networks, Biolnformatics 24 (13) (2008) 1481-1488. 

5. G. Cardona, M. Llabres, F. Rossello, G. Valiente, Metrics for phylogenetic networks I: Generalizations 
of the Robinson- Foulds metric, submitted (2008). 

6. G. Cardona, M. Llabres, F. Rossello, G. Valiente, Metrics for phylogenetic networks II: Nodal and 
triplets metrics, submitted (2008). 

7. G. Cardona, M. Llabres, F. Rossello, G. Valiente, Nodal metrics for rooted phylogenetic trees, sub- 
mitted, available at arxiv.org/abs/0806.2035 (2008). 

8. G. Cardona, F. Rossello, G. Valiente, Comparison of tree-child phylogenetic networks, IEEE T. 
Comput. BioL preprint, 30 June 2008 , doi: 10. 1109/TCBB. 2007. 70270. 

9. G. Cardona, F. Rossello, G. Valiente, Tripartitions do not always discriminate phylogenetic networks. 
Math. Biosci. 211 (2) (2008) 356-370. 

10. W. F. Doolittle, Phylogenetic classification and the universal tree. Science 284 (5423) (1999) 2124- 
2128. 

11. J. S. Farris, A successive approximations approach to character weighting, Syst. Zool. 18 (1969) 
374-385. 

12. J. S. Farris, On comparing the shapes of taxonomic trees, Syst. Zool. 22 (1973) 50-54. 

13. D. Gusfield, S. Eddhu, C. Langley, The fine structure of galls in phylogenetic networks, INFORMS 
J. Comput, 16 (4) (2004) 459-469. 

14. D. Gusfield, S. Eddhu, C. Langley, Optimal, efficient reconstruction of phylogenetic networks with 
constrained recombination, J. Biolnformatics Comput. Biol. 2 (1) (2004) 173-213. 

15. J. Hein, M. H. Schierup, C. Wiuf, Gene Genealogies, Variation and Evolution: A Primer in Coalescent 
Theory, Oxford University Press, 2005. 

16. D. H. Huson, D. Bryant, Application of Phylogenetic Networks in Evolutionary Studies, Mol. Biol. 
Evol. 23 (2) (2006) 254-267. 

17. D. H. Huson, T. H. Klopper, Beyond galled trees - decomposition and computation of galled networks, 
in: Proceedings RECOMB 2007, vol. 4453 of Lecture Notes in Computer Science, Springer- Verlag, 
2007. 

18. B. M. E. Moret, L. Nakhleh, T. Warnow, C. R. Linder, A. Tholse, A. Padolina, J. Sun, R. Timme, 
Phylogenetic networks: Modeling, reconstructibility, and accuracy, IEEE T. Comput. Biol. 1 (1) 
(2004) 13-23. 

19. L. Nakhleh, J. Sun, T. Warnow, C. R. Linder, B. M. E. Moret, A. Tholse, Towards the development 
of computational tools for evaluating phylogenetic network reconstruction methods, in: Proc. 8th 
Pacific Symp. Biocomputing, 2003. 

30 



20. L. Nakhleh, J. Sun, T. Warnow, C. R. Linder, B. M. E. Moret, A. Tholse, Towards the development 
of computational tools for evaluating phylogenetic network reconstruction methods, in: Proc. 8th 
Pacific Symp. Biocomputing, 2003. 

21. L. Nakhleh, T. Warnow, C. R. Linder, K. S. John, Reconstructing reticulate evolution in species: 
Theory and practice, J. Comput. Biol. 12 (6) (2005) 796-811. 

22. J. B. Phipps, Dendrogram topology, Syst. Zool. 20 (1971) 306-308. 

23. C. Semple, Hybridization networks, in: O. Gascuel, M. Steel (eds.). Reconstructing evolution: New 
mathematical and computational advances, Oxford University Press, 2008, pp. 277-314. 

24. Y. A. Smolenskii, A method for the linear recording of graphs, USSR Computational Mathematics 
and Mathematical Physics 2 (1963) 396-397. 

25. Y. S. Song, J. Hein, Constructing minimal ancestral recombination graphs, J. Comput. Biol. 12 (2) 
(2005) 147-169. 

26. M. A. Steel, D. Penny, Distributions of tree comparison metrics — some new results, Syst. Biol. 42 (2) 
(1993) 126-141. 

27. G. Valiente, Phylogenetic networks, course at the Int. Summer School on Bioinformatics and Com- 
putational Biology Lipari (June 14-21, 2008). 

28. L. Wang, K. Zhang, L. Zhang, Perfect phylogenetic networks with recombination, J. Comput. Biol. 
8 (1) (2001) 69-78. 

29. W. T. Williams, H. T. Clifford, On the comparison of two classifications of the same set of elements, 
Taxon 20 (4) (1971) 519-522. 

30. S. J. Willson, Restrictions on meaningful phylogenetic networks, contributed talk at the EMBO 
Workshop on Current Challenges and Problems in Phylogenetics (Isaac Newton Institute for Math- 
ematical Sciences, Cambridge, UK, 3-7 September 2007). 

31. S. J. Willson, Reconstruction of certain phylogenetic networks from the genomes at their leaves, J. 
Theor. Biol. 252 (2008) 338-349. 

32. S. M. WooUey, D. Posada, K. A. Crandall, A comparison of phylogenetic network methods using 
computer simulation, Plos ONE 3 (4) (2008) el913. 



31 



